Advanced Voting Method for Improving Random Forest Classification Algorithm Performance in Machine Learning
نویسنده
چکیده
The Random Forest Classi f icat ion Algorithm is a popular Ensemble learning algori thm which deals wi th c lass i f ica tion of da ta with given set of a t t ribu tes on the basis of majori ty vo tes f rom various decision trees o f that fores t (Bre iman,Cut ler,2004). Classi f ica tion on the basis of majori ty votes by the decision t rees i s not be best way to pred ict c lass i f ica tion s ince di f ferent decision t rees may have dif ferent level of accuracy a t making pred ict ions. We conducted an experiment on d if ferent dataset wi th varying number of at t ribu tes and the accuracy fo r class i f ica tion of data on the basis of majori ty vo tes vs. weighted vot ing(on the basis of per formance of each tree in the fores t on the basis of i t s F1 score) was compared. I t was found that the weighted vot ing on the basis of F1 score ou tperforms the m ajori ty voting method in class i f ica tion accuracy. Here in th is paper we propose the vo ting mechanism in Random Forest Classi f icat ion for solving classi f ication problem should be on the basis o f F1 scores o f the decision t rees.
منابع مشابه
Application of ensemble learning techniques to model the atmospheric concentration of SO2
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...
متن کاملA Random Forest Classifier based on Genetic Algorithm for Cardiovascular Diseases Diagnosis (RESEARCH NOTE)
Machine learning-based classification techniques provide support for the decision making process in the field of healthcare, especially in disease diagnosis, prognosis and screening. Healthcare datasets are voluminous in nature and their high dimensionality problem comprises in terms of slower learning rate and higher computational cost. Feature selection is expected to deal with the high dimen...
متن کاملSpectral-spatial classification of hyperspectral images by combining hierarchical and marker-based Minimum Spanning Forest algorithms
Many researches have demonstrated that the spatial information can play an important role in the classification of hyperspectral imagery. This study proposes a modified spectral–spatial classification approach for improving the spectral–spatial classification of hyperspectral images. In the proposed method ten spatial/texture features, using mean, standard deviation, contrast, homogeneity, corr...
متن کاملA Comparative Study of SVM and RF Methods for Classification of Alteration Zones Using Remotely Sensed Data
Identification and mapping of the significant alterations are the main objectives of the exploration geochemical surveys. The field study is time-consuming and costly to produce the classified maps. Therefore, the processing of remotely sensed data, which provide timely and multi-band (multi-layer) data, can be substituted for the field study. In this study, the ASTER imagery is used for altera...
متن کاملImproving the performance of recommender systems in the face of the cold start problem by analyzing user behavior on social network
The goal of recommender system is to provide desired items for users. One of the main challenges affecting the performance of recommendation systems is the cold-start problem that is occurred as a result of lack of information about a user/item. In this article, first we will present an approach, uses social streams such as Twitter to create a behavioral profile, then user profiles are clusteri...
متن کامل